To solve this question we use the dataset “History of Philosophy”.
First, take a look at the text data. We can easily tell the text is seperated by different books by different authers in different schools.
## [1] "title" "author"
## [3] "school" "sentence_spacy"
## [5] "sentence_str" "original_publication_date"
## [7] "corpus_edition_date" "sentence_length"
## [9] "sentence_lowered" "tokenized_txt"
## [11] "lemmatized_str"
There are 13 schools in total, and we try to sort the text by schools and original published date to see the distribution of the text.
names(table(df[,"school"]))
## [1] "analytic" "aristotle" "capitalism" "communism"
## [5] "continental" "empiricism" "feminism" "german_idealism"
## [9] "nietzsche" "phenomenology" "plato" "rationalism"
## [13] "stoicism"
datatable(as.matrix(table(df[,"school"])))
datatable(as.matrix(table(df[,"original_publication_date"])))
df$published_date<-floor(df$original_publication_date)
df%>%
group_by(school,original_publication_date)%>%
summarise(count = n())
## `summarise()` has grouped output by 'school'. You can override using the `.groups` argument.
a <- ggplot(data = df[df$published_date<0,], aes(x = published_date, fill = school)) +geom_bar(position = "dodge")+scale_fill_manual(values = c('#B3CDE3','#FBB4AE'))
b <- ggplot(data = df[df$published_date>0&df$published_date<1600,], aes(x = published_date, fill = school)) +geom_bar(position = "dodge")+scale_fill_manual(values = c('#DECBE4'))
p1<-ggplotly(a)
p2<-ggplotly(b)
subplot(p1, p2)
c <- ggplot(data = df[df$published_date>1600&df$published_date<1986,], aes(x = published_date, fill = school)) +geom_bar(position = "dodge")
ggplotly(c)
As we can see from the time series charts, different schools come in different time period. Before Century, Plato firstly exists and then comes Aristotle. In A.C.100-200, Stoicism take over the philosophy field. There was no other school after that before 1600. The Rationalism appeared in 1637 and existed until 1710. Meanwhile, Empiricism appeared in 1674 and disappeared before 1780. In late 17th century, Capitalism and Feminism start to appear, and German_idealism played a very important part around 1800. Communism and Nietzsche are the most two common schools in 18 century and both stopped spreading before 19 century. In the following century, Analytic and Phenomenology have published works constantly. followed by lots of Continental and few Feminism near millennium.
By analyzing different schools over time, we can roughly get the idea of what philosophers are focused on over time.
We first merge the text data by school into corpus, remove some stopwords and punctuation using tm package. And bulid a TermDocumentMatrix (inverse of DTM that contains counts of each words in every documents) to see the frequency of the words.
By using the datatable in R we can search the counts of certain words in a specified doc. for example, the word “thee” appears 306 times in the stoicism school’s text.
As we can see there are several useful words in each school’s wordcloud. We can conclude and extract some keywords in each school sorted by time.
Plato school concerns about think, say, things, socrates, good, man, soul.
set.seed(123)
wordcloud(words = dat_plato$word, freq = dat_plato$freq, scale = c(4, 0.2),min.freq = 10, max.words=100, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
However, there are still some difficulty we need to combersome just using R. for example, there seems to be lots of meaningless words(i.e. one, thing, etc.) in the wordcloud. To overcome this, I use python packages to further clean the data by the process stated in the following section. As we can see from the following graph, the key words becomes more clear.
Key words of Plato:think, things, Socrates, good, people, soul, knowledge.
Most words plato talk about.
Aristotle school concerns about man, time, animals, body, parts.
wordcloud(words = dat_aristotle$word, freq = dat_aristotle$freq, scale = c(4, 0.2),min.freq = 10, max.words=100, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
Key words of Aristotle:case, like, reason, animal, nature, fact, body.
Stoicism school contains more ancient English words such as thou, thee, thy, doth.
wordcloud(words = dat_stoicism$word, freq = dat_stoicism$freq, scale = c(4, 0.2),min.freq = 10, max.words=100, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
Key words of Stoicism:thou, nature, thee, world, mind, reason, good,life.
Rationalism school contains most god, body, nature, mind,reason.
wordcloud(words = dat_rationalism$word, freq = dat_rationalism$freq, scale = c(4, 0.2),min.freq = 10, max.words=150, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
Key words of Rationalism:reason, mind, bodi, know, soul, cause, think,nature,certain.
Empiricism school contains most idea, mind, may, knowledge.
wordcloud(words = dat_empiricism$word, freq = dat_empiricism$freq, scale = c(4, 0.2),min.freq = 10, max.words=150, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
Key words of Empiricism:idea, object, reason, think, nature, mind, power,passion.
Capitalism school contains most price, money, labour, value,capital,country,trade.
wordcloud(words = dat_capitalism$word, freq = dat_capitalism$freq, scale = c(4, 0.2),min.freq = 10, max.words=140, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
Key words of Capitalism:country, employ, great, time, trade, profit, product.
Feminism school contains most woman, man, love, black,can,life,mother.
wordcloud(words = dat_feminism$word, freq = dat_feminism$freq, scale = c(4, 0.2),min.freq = 10, max.words=165, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
Key words of Feminism:woman, love, mother, husband, nature, like, life.
German_idealism school contains most concept, nature, self, existence,consciousness.
wordcloud(words = dat_german_idealism$word, freq = dat_german_idealism$freq, scale = c(4, 0.2),min.freq = 10, max.words=100, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
Key words of German_idealism:determine, concept, object, exist, nature, conscious, think.
Communism school contains most labour, value, production, work,capital,power,social.
wordcloud(words = dat_communism$word, freq = dat_communism$freq, scale = c(4, 0.2),min.freq = 10, max.words=100, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
Key words of Communism:work, labour, hand, manufacture, capital, product, capitalist, employ.
Nietzsche school contains most thou, man, Zarathustra, world,life.
wordcloud(words = dat_nietzsche$word, freq = dat_nietzsche$freq, scale = c(4, 0.2),min.freq = 10, max.words=100, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
Key words of Nietzsche:life, like, Zarathustra, love, know, world, christian.
Analytic school contains most say, may, sense, theory,true,world.
wordcloud(words = dat_analytic$word, freq = dat_analytic$freq, scale = c(4, 0.2),min.freq = 10, max.words=100, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
Key words of Analytic:case, know, think, mean, differ, certain, fact.
Phenomenology school contains most world, dasein, time, present,sense,knowledge.
wordcloud(words = dat_phenomenology$word, freq = dat_phenomenology$freq, scale = c(4, 0.2),min.freq = 10, max.words=100, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
Key words of Phenomenology:world, object, think, mean, possible, experience, understand.
Continental school contains most madness, language, time, form,order,nature,thought.
wordcloud(words = dat_continental$word, freq = dat_continental$freq, scale = c(4, 0.2),min.freq = 10, max.words=100, random.order=FALSE, rot.per=0.30, colors=brewer.pal(8, "Dark2"))
Key words of Continental:form, mean, mad, differ, nature, order, time, think, language.
For better understanding and solve the problem to discuss the question we raised in the beginning, what does philosophy mainly concern, we use python to conduct the following process for all the text data.And draw the wordcloud above.
[Reference:(https://towardsdatascience.com/topic-modeling-and-latent-dirichlet-allocation-in-python-9bf156893c24)]
We plotted the Top Words after being processed. This can give us a peek of the main subjects philosohers are talking about throughout the history of philosophy.
Most words philosophers talk about.
Most words philosophers talk about.
Using LDA we can summarize the whole text data into several topics. firstly, we combine the whole document and perform topic modeling by setting topic numbers less than 13. This is because some school have similar keywords as others from the inspection we made earlier. We choose to summarize 7 topics and plot them using pyLDAvis in python.
Each bubble represents a topic. The larger the bubble, the higher percentage of the number of sentences in the corpus is about that topic. Blue bars represent the overall frequency of each word in the corpus. Red bars give the estimated number of times a given term was generated by a given topic.
Most words philosophers talk about.
As we can see from the image, there are about 20,000 of the word ‘nature’, and this term is used about 7,000 times within topic 1. The word with the longest red bar is the word that is used the most by the school belonging to that topic. Topic0 attributes to most of the dataset where object and world are most discussed. Topic1 is also apperant including labour and product value etc..
\[Topic0: 0.009*object + 0.008*world + 0.007*determin + 0.007*mean + 0.006*think + 0.006*natur + 0.006*exist + 0.006*possibl + 0.006*concept + 0.006*time\] \[ Topic1 : 0.009*labour + 0.007*valu + 0.005*natur + 0.005*product + 0.005*capit + 0.005*time + 0.005*work + 0.005*differ + 0.004*produc + 0.004*form \]
\[ Topic2 : 0.006*women + 0.005*woman + 0.004*time + 0.004*natur + 0.004*like + 0.004*life + 0.003*know + 0.003*good + 0.003*world + 0.003*think \] \[ Topic3 : 0.006*natur + 0.006*differ + 0.006*bodi + 0.006*reason + 0.006*say + 0.006*think + 0.006*idea + 0.005*time + 0.005*object + 0.005*know \] \[ Topic4 : 0.007*think + 0.006*natur + 0.006*idea + 0.006*object + 0.005*differ + 0.005*mean + 0.005*case + 0.005*time + 0.005*know + 0.004*relat \] \[ Topic5 : 0.007*good + 0.006*think + 0.006*say + 0.006*time + 0.006*natur + 0.005*differ + 0.005*case + 0.005*bodi + 0.005*know + 0.005*like \] \[ Topic6 : 0.007*natur + 0.007*idea + 0.006*object + 0.005*reason + 0.005*think + 0.005*exist + 0.005*differ + 0.005*time + 0.004*bodi + 0.004*relat \]
We visualized and modeled the text data and topics after basic text mining and data analysis. We can conclude that:
Before century, they have started talking about nature, think, knowledge and lots of concepts already existed. Since then these topics became common in most schools throughout the history.
Stoicism and Nietzsche are big fans of antique English.
In the late 18 century Capitalism and Communism schools started talking about labour, value, production, work,capital,power,social.
Capitalism and Feminism have very distinguishable keywords and topics which may be the reason they lasted for quite long time for such short production.
For further exploration, the topic modeling within each school and sentiment analysis are also worth doing.Due to time limit, we will leave the mystery.